Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 32409 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 2.3 MiB |
| Average record size in memory | 76.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 5 |
df_index is highly correlated with person_age and 1 other fields | High correlation |
person_age is highly correlated with df_index and 1 other fields | High correlation |
loan_amnt is highly correlated with loan_percent_income | High correlation |
loan_percent_income is highly correlated with loan_amnt | High correlation |
cb_person_cred_hist_length is highly correlated with df_index and 1 other fields | High correlation |
df_index is highly correlated with person_age and 1 other fields | High correlation |
person_age is highly correlated with df_index and 1 other fields | High correlation |
loan_amnt is highly correlated with loan_percent_income | High correlation |
loan_percent_income is highly correlated with loan_amnt | High correlation |
cb_person_cred_hist_length is highly correlated with df_index and 1 other fields | High correlation |
df_index is highly correlated with person_age and 1 other fields | High correlation |
person_age is highly correlated with df_index and 1 other fields | High correlation |
cb_person_cred_hist_length is highly correlated with df_index and 1 other fields | High correlation |
cb_person_default_on_file is highly correlated with loan_grade | High correlation |
loan_grade is highly correlated with cb_person_default_on_file | High correlation |
df_index is highly correlated with person_age and 2 other fields | High correlation |
person_age is highly correlated with df_index and 1 other fields | High correlation |
loan_grade is highly correlated with loan_int_rate and 1 other fields | High correlation |
loan_amnt is highly correlated with df_index and 1 other fields | High correlation |
loan_int_rate is highly correlated with loan_grade and 1 other fields | High correlation |
loan_status is highly correlated with loan_percent_income | High correlation |
loan_percent_income is highly correlated with loan_amnt and 1 other fields | High correlation |
cb_person_default_on_file is highly correlated with loan_grade and 1 other fields | High correlation |
cb_person_cred_hist_length is highly correlated with df_index and 1 other fields | High correlation |
df_index is uniformly distributed | Uniform |
df_index has unique values | Unique |
person_emp_length has 4973 (15.3%) zeros | Zeros |
Reproduction
| Analysis started | 2021-12-21 21:29:19.772944 |
|---|---|
| Analysis finished | 2021-12-21 21:29:40.017766 |
| Duration | 20.24 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
df_index
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONUNIFORMUNIQUE| Distinct | 32409 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16273.36086 |
| Minimum | 1 |
|---|---|
| Maximum | 32580 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 253.3 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1626.4 |
| Q1 | 8108 |
| median | 16229 |
| Q3 | 24434 |
| 95-th percentile | 30952.6 |
| Maximum | 32580 |
| Range | 32579 |
| Interquartile range (IQR) | 16326 |
Descriptive statistics
| Standard deviation | 9415.74565 |
|---|---|
| Coefficient of variation (CV) | 0.5785987131 |
| Kurtosis | -1.204918266 |
| Mean | 16273.36086 |
| Median Absolute Deviation (MAD) | 8163 |
| Skewness | 0.002645606033 |
| Sum | 527403352 |
| Variance | 88656266.15 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 21729 | 1 | < 0.1% |
| 21742 | 1 | < 0.1% |
| 21741 | 1 | < 0.1% |
| 21740 | 1 | < 0.1% |
| 21739 | 1 | < 0.1% |
| 21738 | 1 | < 0.1% |
| 21737 | 1 | < 0.1% |
| 21736 | 1 | < 0.1% |
| 21735 | 1 | < 0.1% |
| Other values (32399) | 32399 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 32580 | 1 | |
| 32579 | 1 | |
| 32578 | 1 | |
| 32577 | 1 | |
| 32576 | 1 | |
| 32575 | 1 | |
| 32574 | 1 | |
| 32573 | 1 | |
| 32572 | 1 | |
| 32571 | 1 |
| Distinct | 56 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 27.7307538 |
| Minimum | 20 |
|---|---|
| Maximum | 94 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 253.3 KiB |
Quantile statistics
| Minimum | 20 |
|---|---|
| 5-th percentile | 22 |
| Q1 | 23 |
| median | 26 |
| Q3 | 30 |
| 95-th percentile | 40 |
| Maximum | 94 |
| Range | 74 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 6.210445205 |
|---|---|
| Coefficient of variation (CV) | 0.2239551528 |
| Kurtosis | 5.867251216 |
| Mean | 27.7307538 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 1.942359328 |
| Sum | 898726 |
| Variance | 38.56962965 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 23 | 3861 | |
| 22 | 3606 | |
| 24 | 3526 | |
| 25 | 3023 | 9.3% |
| 26 | 2462 | 7.6% |
| 27 | 2127 | 6.6% |
| 28 | 1848 | 5.7% |
| 29 | 1682 | 5.2% |
| 30 | 1310 | 4.0% |
| 21 | 1212 | 3.7% |
| Other values (46) | 7752 |
| Value | Count | Frequency (%) |
| 20 | 15 | < 0.1% |
| 21 | 1212 | 3.7% |
| 22 | 3606 | |
| 23 | 3861 | |
| 24 | 3526 | |
| 25 | 3023 | |
| 26 | 2462 | |
| 27 | 2127 | |
| 28 | 1848 | |
| 29 | 1682 |
| Value | Count | Frequency (%) |
| 94 | 1 | < 0.1% |
| 84 | 1 | < 0.1% |
| 80 | 1 | < 0.1% |
| 78 | 1 | < 0.1% |
| 76 | 1 | < 0.1% |
| 73 | 3 | < 0.1% |
| 70 | 7 | |
| 69 | 5 | |
| 67 | 1 | < 0.1% |
| 66 | 9 |
person_income
Real number (ℝ≥0)
| Distinct | 4294 |
|---|---|
| Distinct (%) | 13.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 65894.27705 |
| Minimum | 4000 |
|---|---|
| Maximum | 2039784 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 253.3 KiB |
Quantile statistics
| Minimum | 4000 |
|---|---|
| 5-th percentile | 22915.2 |
| Q1 | 38500 |
| median | 55000 |
| Q3 | 79200 |
| 95-th percentile | 138000 |
| Maximum | 2039784 |
| Range | 2035784 |
| Interquartile range (IQR) | 40700 |
Descriptive statistics
| Standard deviation | 52517.86997 |
|---|---|
| Coefficient of variation (CV) | 0.7970019905 |
| Kurtosis | 226.0857593 |
| Mean | 65894.27705 |
| Median Absolute Deviation (MAD) | 19337 |
| Skewness | 9.77787418 |
| Sum | 2135567625 |
| Variance | 2758126667 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 60000 | 1040 | 3.2% |
| 30000 | 844 | 2.6% |
| 50000 | 772 | 2.4% |
| 40000 | 655 | 2.0% |
| 45000 | 586 | 1.8% |
| 75000 | 577 | 1.8% |
| 65000 | 529 | 1.6% |
| 48000 | 527 | 1.6% |
| 70000 | 525 | 1.6% |
| 42000 | 520 | 1.6% |
| Other values (4284) | 25834 |
| Value | Count | Frequency (%) |
| 4000 | 1 | < 0.1% |
| 4080 | 1 | < 0.1% |
| 4200 | 2 | |
| 4800 | 3 | |
| 4888 | 1 | < 0.1% |
| 5000 | 2 | |
| 5500 | 1 | < 0.1% |
| 6000 | 4 | |
| 7000 | 1 | < 0.1% |
| 7200 | 3 |
| Value | Count | Frequency (%) |
| 2039784 | 1 | < 0.1% |
| 1900000 | 1 | < 0.1% |
| 1782000 | 1 | < 0.1% |
| 1440000 | 1 | < 0.1% |
| 1362000 | 1 | < 0.1% |
| 1200000 | 3 | |
| 948000 | 1 | < 0.1% |
| 900000 | 4 | |
| 889000 | 1 | < 0.1% |
| 828000 | 2 |
person_home_ownership
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 32.0 KiB |
| RENT | |
|---|---|
| MORTGAGE | |
| OWN | |
| OTHER | 106 |
Length
| Max length | 8 |
|---|---|
| Median length | 4 |
| Mean length | 5.573852942 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | OWN |
|---|---|
| 2nd row | MORTGAGE |
| 3rd row | RENT |
| 4th row | RENT |
| 5th row | OWN |
Common Values
| Value | Count | Frequency (%) |
| RENT | 16374 | |
| MORTGAGE | 13366 | |
| OWN | 2563 | 7.9% |
| OTHER | 106 | 0.3% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| rent | 16374 | |
| mortgage | 13366 | |
| own | 2563 | 7.9% |
| other | 106 | 0.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 35 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.651948533 |
| Minimum | 0 |
|---|---|
| Maximum | 41 |
| Zeros | 4973 |
| Zeros (%) | 15.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 253.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 4 |
| Q3 | 7 |
| 95-th percentile | 12 |
| Maximum | 41 |
| Range | 41 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 4.057458746 |
|---|---|
| Coefficient of variation (CV) | 0.8722062845 |
| Kurtosis | 2.375902037 |
| Mean | 4.651948533 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 1.249412938 |
| Sum | 150765 |
| Variance | 16.46297147 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=35)
| Value | Count | Frequency (%) |
| 0 | 4973 | |
| 2 | 3831 | |
| 3 | 3442 | |
| 5 | 2926 | |
| 1 | 2897 | |
| 4 | 2861 | |
| 6 | 2652 | |
| 7 | 2185 | |
| 8 | 1676 | 5.2% |
| 9 | 1359 | 4.2% |
| Other values (25) | 3607 |
| Value | Count | Frequency (%) |
| 0 | 4973 | |
| 1 | 2897 | |
| 2 | 3831 | |
| 3 | 3442 | |
| 4 | 2861 | |
| 5 | 2926 | |
| 6 | 2652 | |
| 7 | 2185 | |
| 8 | 1676 | 5.2% |
| 9 | 1359 | 4.2% |
| Value | Count | Frequency (%) |
| 41 | 1 | < 0.1% |
| 38 | 1 | < 0.1% |
| 34 | 1 | < 0.1% |
| 31 | 4 | |
| 30 | 2 | < 0.1% |
| 29 | 1 | < 0.1% |
| 28 | 3 | < 0.1% |
| 27 | 5 | |
| 26 | 6 | |
| 25 | 8 |
loan_intent
Categorical
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 32.0 KiB |
| EDUCATION | |
|---|---|
| MEDICAL | |
| VENTURE | |
| PERSONAL | |
| DEBTCONSOLIDATION |
Length
| Max length | 17 |
|---|---|
| Median length | 8 |
| Mean length | 10.05334938 |
| Min length | 7 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | EDUCATION |
|---|---|
| 2nd row | MEDICAL |
| 3rd row | MEDICAL |
| 4th row | MEDICAL |
| 5th row | VENTURE |
Common Values
| Value | Count | Frequency (%) |
| EDUCATION | 6409 | |
| MEDICAL | 6042 | |
| VENTURE | 5679 | |
| PERSONAL | 5496 | |
| DEBTCONSOLIDATION | 5189 | |
| HOMEIMPROVEMENT | 3594 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| education | 6409 | |
| medical | 6042 | |
| venture | 5679 | |
| personal | 5496 | |
| debtconsolidation | 5189 | |
| homeimprovement | 3594 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 32.1 KiB |
| A | |
|---|---|
| B | |
| C | |
| D | |
| E | 963 |
| Other values (2) | 305 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | B |
|---|---|
| 2nd row | C |
| 3rd row | C |
| 4th row | C |
| 5th row | A |
Common Values
| Value | Count | Frequency (%) |
| A | 10702 | |
| B | 10384 | |
| C | 6436 | |
| D | 3619 | 11.2% |
| E | 963 | 3.0% |
| F | 241 | 0.7% |
| G | 64 | 0.2% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| a | 10702 | |
| b | 10384 | |
| c | 6436 | |
| d | 3619 | 11.2% |
| e | 963 | 3.0% |
| f | 241 | 0.7% |
| g | 64 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 753 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9592.486655 |
| Minimum | 500 |
|---|---|
| Maximum | 35000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 253.3 KiB |
Quantile statistics
| Minimum | 500 |
|---|---|
| 5-th percentile | 2000 |
| Q1 | 5000 |
| median | 8000 |
| Q3 | 12250 |
| 95-th percentile | 24000 |
| Maximum | 35000 |
| Range | 34500 |
| Interquartile range (IQR) | 7250 |
Descriptive statistics
| Standard deviation | 6320.885127 |
|---|---|
| Coefficient of variation (CV) | 0.6589412479 |
| Kurtosis | 1.419602673 |
| Mean | 9592.486655 |
| Median Absolute Deviation (MAD) | 3800 |
| Skewness | 1.191488821 |
| Sum | 310882900 |
| Variance | 39953588.79 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 10000 | 2649 | 8.2% |
| 5000 | 2032 | 6.3% |
| 12000 | 1795 | 5.5% |
| 6000 | 1794 | 5.5% |
| 15000 | 1496 | 4.6% |
| 8000 | 1441 | 4.4% |
| 4000 | 1062 | 3.3% |
| 3000 | 1027 | 3.2% |
| 20000 | 1013 | 3.1% |
| 7000 | 983 | 3.0% |
| Other values (743) | 17117 |
| Value | Count | Frequency (%) |
| 500 | 5 | < 0.1% |
| 700 | 1 | < 0.1% |
| 725 | 1 | < 0.1% |
| 750 | 1 | < 0.1% |
| 800 | 1 | < 0.1% |
| 900 | 2 | < 0.1% |
| 950 | 1 | < 0.1% |
| 1000 | 315 | |
| 1050 | 4 | < 0.1% |
| 1075 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 35000 | 183 | |
| 34800 | 1 | < 0.1% |
| 34000 | 4 | < 0.1% |
| 33950 | 2 | < 0.1% |
| 33250 | 1 | < 0.1% |
| 33000 | 6 | < 0.1% |
| 32500 | 1 | < 0.1% |
| 32400 | 1 | < 0.1% |
| 32000 | 10 | < 0.1% |
| 31825 | 2 | < 0.1% |
| Distinct | 355 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.01600626 |
| Minimum | 5.42 |
|---|---|
| Maximum | 23.22 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 253.3 KiB |
Quantile statistics
| Minimum | 5.42 |
|---|---|
| 5-th percentile | 6.03 |
| Q1 | 7.88 |
| median | 10.995756 |
| Q3 | 13.464579 |
| 95-th percentile | 16.32 |
| Maximum | 23.22 |
| Range | 17.8 |
| Interquartile range (IQR) | 5.584579 |
Descriptive statistics
| Standard deviation | 3.220489389 |
|---|---|
| Coefficient of variation (CV) | 0.2923463651 |
| Kurtosis | -0.6811872023 |
| Mean | 11.01600626 |
| Median Absolute Deviation (MAD) | 2.505756 |
| Skewness | 0.2093099439 |
| Sum | 357017.7469 |
| Variance | 10.3715519 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 10.995756 | 1049 | 3.2% |
| 7.328423 | 990 | 3.1% |
| 7.51 | 754 | 2.3% |
| 10.99 | 745 | 2.3% |
| 7.49 | 638 | 2.0% |
| 7.88 | 636 | 2.0% |
| 13.464579 | 629 | 1.9% |
| 5.42 | 588 | 1.8% |
| 7.9 | 566 | 1.7% |
| 11.49 | 486 | 1.5% |
| Other values (345) | 25328 |
| Value | Count | Frequency (%) |
| 5.42 | 588 | |
| 5.79 | 395 | |
| 5.99 | 353 | |
| 6 | 12 | < 0.1% |
| 6.03 | 444 | |
| 6.17 | 214 | 0.7% |
| 6.39 | 62 | 0.2% |
| 6.54 | 249 | |
| 6.62 | 412 | |
| 6.76 | 176 | 0.5% |
| Value | Count | Frequency (%) |
| 23.22 | 1 | < 0.1% |
| 22.48 | 1 | < 0.1% |
| 22.11 | 3 | |
| 22.06 | 1 | < 0.1% |
| 21.74 | 5 | |
| 21.64 | 1 | < 0.1% |
| 21.36 | 5 | |
| 21.27 | 3 | |
| 21.21 | 4 | |
| 21.14 | 1 | < 0.1% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 253.3 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 25321 | |
| 1 | 7088 | 21.9% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 25321 | |
| 1 | 7088 | 21.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 77 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1702480792 |
| Minimum | 0 |
|---|---|
| Maximum | 0.83 |
| Zeros | 8 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 253.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.04 |
| Q1 | 0.09 |
| median | 0.15 |
| Q3 | 0.23 |
| 95-th percentile | 0.38 |
| Maximum | 0.83 |
| Range | 0.83 |
| Interquartile range (IQR) | 0.14 |
Descriptive statistics
| Standard deviation | 0.1067849722 |
|---|---|
| Coefficient of variation (CV) | 0.6272315826 |
| Kurtosis | 1.215130579 |
| Mean | 0.1702480792 |
| Median Absolute Deviation (MAD) | 0.07 |
| Skewness | 1.063283332 |
| Sum | 5517.57 |
| Variance | 0.01140303028 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.1 | 1522 | 4.7% |
| 0.13 | 1468 | 4.5% |
| 0.08 | 1432 | 4.4% |
| 0.07 | 1390 | 4.3% |
| 0.11 | 1375 | 4.2% |
| 0.09 | 1372 | 4.2% |
| 0.12 | 1288 | 4.0% |
| 0.14 | 1284 | 4.0% |
| 0.06 | 1281 | 4.0% |
| 0.17 | 1254 | 3.9% |
| Other values (67) | 18743 |
| Value | Count | Frequency (%) |
| 0 | 8 | < 0.1% |
| 0.01 | 138 | 0.4% |
| 0.02 | 368 | 1.1% |
| 0.03 | 773 | |
| 0.04 | 970 | |
| 0.05 | 1172 | |
| 0.06 | 1281 | |
| 0.07 | 1390 | |
| 0.08 | 1432 | |
| 0.09 | 1372 |
| Value | Count | Frequency (%) |
| 0.83 | 1 | < 0.1% |
| 0.78 | 1 | < 0.1% |
| 0.77 | 2 | |
| 0.76 | 1 | < 0.1% |
| 0.72 | 1 | < 0.1% |
| 0.71 | 3 | |
| 0.7 | 3 | |
| 0.69 | 2 | |
| 0.68 | 3 | |
| 0.67 | 4 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.9 KiB |
| N | |
|---|---|
| Y |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | N |
|---|---|
| 2nd row | N |
| 3rd row | N |
| 4th row | Y |
| 5th row | N |
Common Values
| Value | Count | Frequency (%) |
| N | 26680 | |
| Y | 5729 | 17.7% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| n | 26680 | |
| y | 5729 | 17.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
cb_person_cred_hist_length
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 29 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.811194421 |
| Minimum | 2 |
|---|---|
| Maximum | 30 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 253.3 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 4 |
| Q3 | 8 |
| 95-th percentile | 14 |
| Maximum | 30 |
| Range | 28 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 4.057898738 |
|---|---|
| Coefficient of variation (CV) | 0.6982899632 |
| Kurtosis | 3.699456761 |
| Mean | 5.811194421 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 1.657990783 |
| Sum | 188335 |
| Variance | 16.46654217 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=29)
| Value | Count | Frequency (%) |
| 2 | 5924 | |
| 3 | 5902 | |
| 4 | 5879 | |
| 7 | 1898 | 5.9% |
| 8 | 1893 | 5.8% |
| 9 | 1888 | 5.8% |
| 5 | 1875 | 5.8% |
| 6 | 1849 | 5.7% |
| 10 | 1846 | 5.7% |
| 14 | 492 | 1.5% |
| Other values (19) | 2963 |
| Value | Count | Frequency (%) |
| 2 | 5924 | |
| 3 | 5902 | |
| 4 | 5879 | |
| 5 | 1875 | 5.8% |
| 6 | 1849 | 5.7% |
| 7 | 1898 | 5.9% |
| 8 | 1893 | 5.8% |
| 9 | 1888 | 5.8% |
| 10 | 1846 | 5.7% |
| 11 | 462 | 1.4% |
| Value | Count | Frequency (%) |
| 30 | 22 | |
| 29 | 14 | |
| 28 | 27 | |
| 27 | 22 | |
| 26 | 16 | |
| 25 | 17 | |
| 24 | 30 | |
| 23 | 22 | |
| 22 | 22 | |
| 21 | 20 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | person_age | person_income | person_home_ownership | person_emp_length | loan_intent | loan_grade | loan_amnt | loan_int_rate | loan_status | loan_percent_income | cb_person_default_on_file | cb_person_cred_hist_length | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 21 | 9600 | OWN | 5.0 | EDUCATION | B | 1000 | 11.14 | 0 | 0.10 | N | 2 |
| 1 | 2 | 25 | 9600 | MORTGAGE | 1.0 | MEDICAL | C | 5500 | 12.87 | 1 | 0.57 | N | 3 |
| 2 | 3 | 23 | 65500 | RENT | 4.0 | MEDICAL | C | 35000 | 15.23 | 1 | 0.53 | N | 2 |
| 3 | 4 | 24 | 54400 | RENT | 8.0 | MEDICAL | C | 35000 | 14.27 | 1 | 0.55 | Y | 4 |
| 4 | 5 | 21 | 9900 | OWN | 2.0 | VENTURE | A | 2500 | 7.14 | 1 | 0.25 | N | 2 |
| 5 | 6 | 26 | 77100 | RENT | 8.0 | EDUCATION | B | 35000 | 12.42 | 1 | 0.45 | N | 3 |
| 6 | 7 | 24 | 78956 | RENT | 5.0 | MEDICAL | B | 35000 | 11.11 | 1 | 0.44 | N | 4 |
| 7 | 8 | 24 | 83000 | RENT | 8.0 | PERSONAL | A | 35000 | 8.90 | 1 | 0.42 | N | 2 |
| 8 | 9 | 21 | 10000 | OWN | 6.0 | VENTURE | D | 1600 | 14.74 | 1 | 0.16 | N | 3 |
| 9 | 10 | 22 | 85000 | RENT | 6.0 | VENTURE | B | 35000 | 10.37 | 1 | 0.41 | N | 4 |
Last rows
| df_index | person_age | person_income | person_home_ownership | person_emp_length | loan_intent | loan_grade | loan_amnt | loan_int_rate | loan_status | loan_percent_income | cb_person_default_on_file | cb_person_cred_hist_length | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 32399 | 32571 | 60 | 45600 | RENT | 1.0 | VENTURE | B | 20000 | 10.00 | 1 | 0.44 | N | 26 |
| 32400 | 32572 | 52 | 52000 | OWN | 0.0 | PERSONAL | A | 9600 | 8.49 | 0 | 0.18 | N | 22 |
| 32401 | 32573 | 56 | 90000 | MORTGAGE | 0.0 | PERSONAL | A | 7200 | 6.17 | 0 | 0.08 | N | 19 |
| 32402 | 32574 | 52 | 65004 | RENT | 4.0 | PERSONAL | D | 20000 | 15.58 | 1 | 0.31 | Y | 19 |
| 32403 | 32575 | 52 | 64500 | RENT | 0.0 | EDUCATION | B | 5000 | 11.26 | 0 | 0.08 | N | 20 |
| 32404 | 32576 | 57 | 53000 | MORTGAGE | 1.0 | PERSONAL | C | 5800 | 13.16 | 0 | 0.11 | N | 30 |
| 32405 | 32577 | 54 | 120000 | MORTGAGE | 4.0 | PERSONAL | A | 17625 | 7.49 | 0 | 0.15 | N | 19 |
| 32406 | 32578 | 65 | 76000 | RENT | 3.0 | HOMEIMPROVEMENT | B | 35000 | 10.99 | 1 | 0.46 | N | 28 |
| 32407 | 32579 | 56 | 150000 | MORTGAGE | 5.0 | PERSONAL | B | 15000 | 11.48 | 0 | 0.10 | N | 26 |
| 32408 | 32580 | 66 | 42000 | RENT | 2.0 | MEDICAL | B | 6475 | 9.99 | 0 | 0.15 | N | 30 |